Author: Leonardo Espin
Date: 1/10/2019
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.image as mpimg #to work with raster images
%matplotlib inline
XDF
contains image data that has been flattened to vectors. It contains 5000 training examples of images of 20x20 pixels (a vector of 400 elements)yDF
contains the labels for each image. The images corresponding to zero, have been labeled as 10theta1DF
, and the weights of the output layer are in theta2DF
This is a subset of the MNIST handwritten digit dataset
XDF=pd.read_csv('ex3data1-X.csv',header=None)
yDF=pd.read_csv('ex3data1-y.csv',header=None)
theta1DF=pd.read_csv('ex3weights-T1.csv',header=None)
theta2DF=pd.read_csv('ex3weights-T2.csv',header=None)
print(XDF.shape)
XDF.head()
print(theta1DF.shape)
print(theta2DF.shape)
theta2DF.head()
The images correspond to hand drawn digits (0 to 9), and they can be shown with the imshow
command:
tmp=XDF.iloc[0,:].values.reshape(20,20)
plt.imshow(tmp);
Below I show a mozaic of a 100 images selected at random from the training samples (notice that the bitmap arrays have to be transposed to be shown correctly):
import random
#select a 100 images (rows) at random
selection=[random.randint(0,5000) for x in range(100)];
image=np.zeros((20*10, 20*10)) #for constructing a mozaic of 10x10 images
coords=[(x,y) for x in range(1,11) for y in range(1,11)];
for k,tup in enumerate(coords):
indYa=0+20*(tup[0]-1)
indYb=19+20*(tup[0]-1)
indXa=0+20*(tup[1]-1)
indXb=19+20*(tup[1]-1)
image[indYa:indYb+1,indXa:indXb+1]=XDF.iloc[selection[k],:].values.reshape(20,20).transpose()
plt.figure(figsize=(8,8))
plt.imshow(image);
The structure of the trained neural network is shown below
Below I apply the neural network to the training set in XDF
. Note that colums or rows of ones have to be added to the flattened images to account for the bias units
#add a column (axis=1) of ones (3rd argument) to the image data (1st argument) at
#the beggining of the matrix (2nd argument)
X=np.insert(XDF.values, 0, 1, axis=1)
X[0:5,0:15]
Multiplying each image by the hidden layer weights to obtain the $z$ values:
Z2=np.matmul(theta1DF.values,X.transpose())
print(Z2.shape)
The values $a^{(2)}_i$, $i=1,\dots,25$ are obtained by applying the sigmoid function (notice that the function is vectorized and applied simultaneously to the whole Z2
matrix):
def g(z):
return 1/(1+np.exp(-z))
g = np.vectorize(g)
A2=g(Z2)
print(A2.shape)
A2[0:3,0:5]
a row of ones is added to the matrix A2
to account for the bias unit
A2=np.insert(A2, 0, 1, axis=0)
A2[0:4,0:5]
Below are the calculations for the output layer, which has 10 nodes corresponding to the 10 categories of the hand-written symbols
Z3=np.matmul(theta2DF.values,A2)
A3=g(Z3)
print(A3.shape)
print('clasification results of first 4 images (choose max value per column):')
A3[0:10,0:4]
classification=np.argmax(A3,axis=0)
classification
Below I show a few classification results chosen at random
import time
for _ in range(4):
k=random.randint(0,100)
k=selection[k]
print('learned value = {}'.format(classification[k]+1))
tmp=XDF.iloc[k,:].values.reshape(20,20).transpose()
plt.imshow(tmp,animated=True)
plt.show()
time.sleep(1.5)
The overall classification accuracy is:
accuracy=(100*sum(classification.reshape(yDF.values.shape) == yDF.values-1)
/len(classification))[0]
print('classification accuracy: {}%'.format(accuracy))
from sklearn.neural_network import MLPClassifier
#the solver chosen below works better with small sets. otherwise use
#stochastic gradient descent or other options
clf = MLPClassifier(solver='adam', #adam is the default
alpha=1e-3, #reg. parameter lambda =1/(2*500)
hidden_layer_sizes=(25,), #1 hidden layer with 25 units
activation='logistic', #the logistic sigmoid function, could change to relu
#max_iter=400,
validation_fraction=0.2)
clf.fit(XDF.values, yDF.values.flatten())#flatten reshapes the 5000x1 col matrix to 1D array
#otherwise sklearn complaints
The learned coefficients are below. Notice that we obtain a very high classification accuracy because in this example we are training with just 5000 images (8% of entire dataset), so the model is most likely overfitting.
T1=clf.coefs_[0]
T2=clf.coefs_[1]
print(T1.shape)
print(T2.shape)
print('theta_0 coefficients:')
print(clf.intercepts_[0].shape)
print(clf.intercepts_[1].shape)
accuracy=(100*sum(clf.predict(XDF.values).reshape(yDF.values.shape) == yDF.values)
/len(classification))[0]
print('classification accuracy: {}%'.format(accuracy))
#or more easily
clf.score(XDF.values, yDF.values)
A Keras DNN trained on the entire dataset can be seen here: Keras and the MNIST dataset